data:image/s3,"s3://crabby-images/55269/55269d04f1e98ff9dec4e3966483e3db7449733d" alt="Thumb"
Perl pattern matching and extraction
Perl has regular expression operators for identifying patterns. The operator
/regular expression/
returns true of false depending on whether the regular expression matches the contents of $_. For example
if (/perl/)
{
print "String contains perl as a substring";
}
if (/(Sat|Sun)day/)
{
print "Weekend day....";
}
The effect is rather like the grep command. To use this operator on other variables you
would write:
$variable=~/regexp/
Regular expression can contain parenthetic sub-expressions, e.g.
if (/(SatlSun)day(..)th(.*)/)
{
$first = $1;
$second = $2;
$third = $3;
}
in which case pert places the objects matched by such sub-expressions in the variables $1, $2 etc.
Perl string replace and searching
The `sed'-like function for replacing all occurrences of a string is easily implemented in Perl using
while (<input>)
{
s/$search/$replace/g;
print output;
}
This example replaces the string inside the default variable. To replace in a general variable we use the operator `=~’, with syntax:
$variable=~s/search/replace/
Here is an example of some of this operator in use. The following is a program which searches and replaces a string in several files. This is useful program indeed for making a change globally in a group of files! The program is called 'file-replace'.
#Look through files for find string and change to new string
#in all files.
#Define a temporary file and check it doesn't exist
#!/local/bin/perl
$outputfile "tmpmarkfind";
unlink $outputfile;
#Check command line for list of files
if($#ARGV<0)
{
die "Syntax:file-replace [file list]\n";
}
print "Enter the string you want to find (Don't use quotes):\n\n:";
$findstring=<STDIN>;
print "Enter the string you want to replace with (Don't use quotes):\n\n:";
chop $findstring;
$replacestring=<STDIN>;
chop $replacestring;
print "\nFind: $findstring\n";
print "Replace: $replacestring\n";
print "\nConfirm (y/n) ";
$y=<STDIN>;
chop $y;
if($y ne "y")
{
die "Aborted -- nothing done.\n";
}
else
{
print "Use CTRL-C to interrupt...\n";
}
#Now shift default array 0ARGV to get arguments 1 by 1
while ($file=shift)
{
if($file eq "file-replace")
{
print "Findmark will not operate on itself!";
next;
}
#Save existing mode of file for later
($dev,$ino,$mode)=stat($file);
open(INPUT,$file)|| warn "Couldn't open $file\n";
open(OUTPUT,"> $outputfile")|| warn "Can't open tmp";
$notify = 1;
while (<INPUT>)
{
if(/$findstring/&& $notify)
{
print "Fixing $file...\n";
$notify = 0;
}
s/$findstring/$replacestring/g;
print OUTPUT;
}
close (OUTPUT);
#If nothing went wrong(if outfile not empty)
#move temp file to original and reset thefile mode saved above
if(!-z $outputfile)
{
rename($outputfile,$file);
chmod($mode,$file);
}
else
{
print "Warning: file empty! \n";
}
}
Perl regular expression
# |
regex '.*' |
- prints every line (matches everything) |
# |
Regex ‘.‘ |
- all lines except those containing only blanks |
|
|
(. doesn't match ws/white-space) |
# |
regex '[a-z]' |
- matches any line containing lowercase |
# |
Regex [^a-z] |
- matches any line containing something which is not lowercase a-z |
# |
regex '(A-Za-z]' |
- matches any line containing letters of any kind |
# |
Regex [0-9] |
- match any line containing numbers |
# |
regex '#.*' |
- line containing a hash symbol followed by anything |
# |
regex '^#.*' |
- line starting with hash symbol (first char) |
# |
regex ';\n' |
- match line ending in a semi-colon |
Example: convert mail to WWW pages
Here is an example program which you could use to automatically turn a mail message of the form
From: Newswire
To: Nail2html
Subject: Nothing happened
On the 13th February at kl. 09:30 nothing happened. No footprints were found leading to the scene of a terrible murder, no evidence of a struggle .... etc etc
Into an html-file for the world wide web. The program works by extracting the message body and subject from the mail and writing html-commands around these to make a web page. The subject field of the mail becomes the title. The other headers get skipped, since the script searches for lines containing the sequence "colon-space" or '. A regular expression is used for this.
#!/local/bin/per'
#Make HTML from mail
&BeginWebPage();
&ReadNewMail();
&EndWebPage();
sub BeginWebPage
{
print "<HTML>\n";
print "<BODY>\n";
}
sub EndWebPage
{
print "</BODY>\n";
print "</HTML>\n";
}
sub ReadNewMail
{
while (<>)
{
if (/Subject:/) # Search for subject line
{
# Extract subject text...
chop;
($left,$right) = split(":",$_);
print "<H1> $right </H1>\n";
next;
}
elsif (/.*:.*/) # Search for - anything: anything
{
next; # skip other headers
print;
}
}
}
Generate perl web pages / WWW
The following program scans through the password database and build a standardized html-page for each user it finds there. It fills in the name of the user in each cach. Note the use of the ‘<<’ operator for extended input, already use in the context of the shell, see <undefined> [ pipes and redirection ], page <undefined>. This allows us to format a whole passage of taxt, inserting variables at strstegic places, and avoid having to the print over many lines
#!/local/bin/perl
#build a default home page for each user in/etc/passwd
$true = 1;
$false = 0;
#First build an associated array of users and full names setpwent();
while($true)
{
($name,$passwd,$uid,$gid,$quota,$comment,$fullname) = getpwent;
$FullName{$name} = $fullname;
print "$name - $FullName{name}\n";
last if($name eq "");
}
print "\n";
# Nov make a unique filename for each page and open a file
foreach $user (sort keys(%FullName))
{
next if ($user eq "");
print "Making page for $user\n"; $outputfile = "$user.html";
open (out',"> $outputfile") || die "Can't open $outputfile\n";
&MakePage;
close(OUT);
}
sub MakePage
{
print OUT<<ENDMARKER;
<HTML>
<HEAD>
<TITLE>$FullName{$user}'s Home Page</TITLE>
</HEAD>
<BODY>
<H1>$FullName{$user}'s Home Page</H1>
Hi welcome to my home page. In case you hadn't got it yet my name is:
$FullName{$user}...
I study at <a href=”http://www.abctutorial.com”>Ontario,Canada</a>
</BODY>
</HTML>
ENDMARKER
}